Score Distributions in Information Retrieval
نویسندگان
چکیده
We review the history of modeling score distributions, focusing on the mixture of normal-exponential by investigating the theoretical as well as the empirical evidence supporting its use. We discuss previously suggested conditions which valid binary mixture models should satisfy, such as the Recall-Fallout Convexity Hypothesis, and formulate two new hypotheses considering the component distributions under some limiting conditions of parameter values. From all the mixtures suggested in the past, the current theoretical argument points to the two gamma as the most-likely universal model, with the normal-exponential being a usable approximation. Beyond the theoretical contribution, we provide new experimental evidence showing vector space or geometric models, and BM25, as being “friendly” to the normal-exponential, and that the non-convexity problem that the mixture possesses is practically not severe.
منابع مشابه
Using Models of Score Distributions in Information Retrieval
Empirical modeling of a number of different text search engines shows that the score distributions on a per query basis may be fitted approximately using an exponential distribution for the set of nonrelevant documents and a normal distribution for the set of relevant documents. This model fits not only probabilistic search engines like INQUERY but also vector space search engines like SMART an...
متن کاملOn Score Distributions and Relevance
We discuss the idea of modelling the statistical distributions of scores of documents, classified as relevant or non-relevant. Various specific combinations of standard statistical distributions have been used for this purpose. Some theoretical considerations indicate problems with some of the choices of pairs of distributions. Specifically, we revisit a generalisation of the well-known inverse...
متن کاملScore Following and Retrieval Based on Chroma and Octave Representation
With the studies of effective representation of music signals and music scores, i.e. chroma and octave features, this work conducts score following and score retrieval. To complement the shortage of chromagram representation, energy distributions in different octaves are used to describe tone height information. By transforming music signals and scores into sequences of feature vectors, score f...
متن کاملMatching Scores of System Relevance and User-Oriented Relevance in SID, ISC and Google Scholar
Background and Aim: The main aim of Information storage and retrieval systems is keeping and retrieving the related information means providing the related documents with users’ needs or requests. This study aimed to answer this question that how much are the system relevance and User- Oriented relevance are matched in SID, SCI and Google Scholar databases. Method: In this study 15 keywords of ...
متن کاملScore Standardization for Robust Comparison of Retrieval Systems
Information retrieval systems are evaluated by applying them to standard test collections of documents, topics, and relevance judgements. An evaluation metric is then used to score a system’s output for each topic; these scores are averaged to obtain an overall measure of effectiveness. However, different topics have differing degrees of difficulty and differing variability in scores, leading t...
متن کاملMeasuring the Ability of Score Distributions to Model Relevance
Modelling the score distribution of documents returned from any information retrieval (IR) system is of both theoretical and practical importance. The goal of which is to be able to infer relevant and nonrelevant documents based on their score to some degree of confidence. In this paper, we show how the performance of mixtures of score distributions can be compared using inference of query perf...
متن کامل